Towards a Large Scale Concept Ontology for Broadcast Video

نویسنده

  • Alexander G. Hauptmann
چکیده

Earlier this year, a major effort was initiated to study the theoretical and empirical aspects of the automatic detection of semantic concepts in broadcast video, complementing ongoing research in video analysis, the TRECVID video analysis evaluations by the National Institute of Standards (NIST) in the U.S., and MPEG-7 standardization. The video analysis community has long struggled to bridge the gap from successful, low-level feature analysis (color histograms, texture, shape) to semantic content description of video. One approach is to utilize a set of intermediate textual descriptors that can be reliably applied to visual scenes (e.g. outdoors, faces, animals). If we can define a rich enough set of such intermediate descriptors in the form of large lexicons and taxonomic classification schemes, then robust and general-purpose semantic content annotation and retrieval will be enabled through these descriptors. Our efforts are substantially broad, as our subject matter is broadcast video, which is almost much unrestricted in terms of content, but includes audio and spoken dialog. In addition, broadcast video has an added layer of 'editing': shots and scenes which are carefully chosen to make the point in a broadcast, but do not directly reflect a reality like surveillance video. We are exploring to what extent broadcast video is amenable to a structured characterization of content (using a large but well-defined lexicon). By necessity, the lexicon will have to be general and broadly applicable, since it will be impossible to give in-depth characterizations of shots for the broad content matter that we are dealing with. But what are the appropriate lexicon items that would allow a sufficiently rich and general description of the video content in broadcast news, which in effect would constitute a general-purpose ontology for describing video content? Our first challenge is to find a large, broad set of descriptors which will be useful in a large variety of broadcast news content. Together with librarians, video archive specialists and experts in multimedia analysis and knowledge representation, we are attempting the definition of an ontology and creating a vocabulary of about a thousand lexical terms that describe the content of broadcast video. Once we have such semantic descriptors, the next question is whether they are actually useful. We will explore the sufficiency and generality of this set of descriptors for annotation and retrieval of video content. One aspect of this work is to empirically determine the feasibility of automatically identifying these descriptions in appropriate video content. We will also annotate Towards a Large Scale Concept Ontology for Broadcast Video 675 larger amounts of video to see if the set of derived descriptors is appropriate over a wide range of content, and to provide a reference truth for an annotated video library. Finding 1000 concepts represented in broadcast news video that can be detected and evaluated necessitates careful lexicon design. The concepts in the lexicon should be useful from a perspective of visual information exploitation. Simultaneously the lexicon must be feasible from the perspective of automatic and semi-automatic detection. The design of the lexicon thus needs to bring together members of the library sciences community, knowledge representation as well as researchers from the multimedia analysis community. The confluence of statistical and non-statistical media analysis with ontologies, classification schemas and lexicons helps place the scalable multimedia semantic concept detection problem in the proper context. Contextsensitive concept detection can also help enhance the detection performance and help the scalability. In designing and evaluating a large scale lexicon the following challenges need to be tackled: • Interpretation of user needs, finding out what do users want from video archives of broadcast news. • Rigorous experiments to understand how user needs can be mapped into the components of the lexicon • Study of automatic concept detection system performance and their impact on retrieval performance and classification of concepts on the basis of performance and relevance. • Understanding of algorithmic approaches for large scale concept detection. • Empirical and theoretical study of the impact of detection performance trade-off on the ultimate usability of the lexicon, specifically evaluating detection accuracy vs. retrieval performance. A year-long workshop is under way, developing recommendations and general criteria for the design of large-scale lexicons for audio-visual content classification in support of systems for searching, filtering, and mining of broadcast video. Our approach is to start with a large, fixed collection of data and explore different types of annotation and lexical labeling for retrieval and description. The resulting lexicon and ontology, if successful, will provide a basis for generations of broadcast news video retrieval and annotation work.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Query Architecture Expansion in Web Using Fuzzy Multi Domain Ontology

Due to the increasing web, there are many challenges to establish a general framework for data mining and retrieving structured data from the Web. Creating an ontology is a step towards solving this problem. The ontology raises the main entity and the concept of any data in data mining. In this paper, we tried to propose a method for applying the "meaning" of the search system, But the problem ...

متن کامل

Automatic Indexing and Retrieval of Large Broadcast News Video Collections - The TRECVID Experience

Most existing operational systems rely purely on automatic speech recognition (ASR) text as the basis for news video indexing and retrieval. While current research shows that ASR text has been the most influential component, results of large scale news video processing experiments indicate that the use of other modality features and external information sources such as the Web is essential in v...

متن کامل

Centralized Clustering Method To Increase Accuracy In Ontology Matching Systems

Ontology is the main infrastructure of the Semantic Web which provides facilities for integration, searching and sharing of information on the web. Development of ontologies as the basis of semantic web and their heterogeneities have led to the existence of ontology matching. By emerging large-scale ontologies in real domain, the ontology matching systems faced with some problem like memory con...

متن کامل

Learning Rules for Semantic Video Event Annotation

Automatic semantic annotation of video events has received a large attention from the scientific community in the latest years, since event recognition is an important task in many applications. Events can be defined by spatio-temporal relations and properties of objects and entities, that change over time; some events can be described by a set of patterns. In this paper we present a framework ...

متن کامل

An Information-Theoretic Framework towards Large-Scale Video Structuring, Threading, and Retrieval

An Information-Theoretic Framework towards Large-Scale Video Structuring, Threading, and Retrieval Winston H. Hsu Video and image retrieval has been an active and challenging research area due to the explosive growth of online video data, personal video recordings, digital photos, and broadcast news videos. In order to effectively manage and use such enormous multimedia resources, users need to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004